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ABSTRACT 


The small-scale linear information in galaxy samples typically lost during non-linear growth can 
be restored to a certain level by the density field reconstruction, which has been demonstrated for 
improving the precision of the baryon acoustic oscillations (BAO) measurements. As proposed in 
the literature, a joint analysis of the power spectrum before and after the reconstruction enables an 
efficient extraction of information carried by high-order statistics. However, the statistics of the post- 
reconstruction density field are difficult to model. In this work, we circumvent this issue by developing 
an accurate emulator for the pre-reconstructed, post-reconstructed, and cross power spectra (Pre, 
Prost, Peross) up to k = 0.5 h Mpc! based on the DARK QUEST N-body simulations. The accuracy 
of the emulator is at percent level, namely, the error of the emulated monopole and quadrupole of the 
power spectra is less than 1% and 5% of the ground truth, respectively. A fit to an example power 
spectra using the emulator shows that the constraints on cosmological parameters get largely improved 
using Pore +Ppost +Peross With kmax = 0.25 h Mpc-, compared to that derived from Pore alone, namely, 
the constraints on (Qm, Ho, og) are tightened by ~ 4196 — 55%, and the uncertainties of the derived 
BAO and RSD parameters (a1, a), fog) shrink by ~ 28% — 54%, respectively. This highlights the 
complementarity among Pre, Ppost and Peross, Which demonstrates the efficiency and practicability of 
a joint Pore, Poost and Peross analysis for cosmological implications. 


1. INTRODUCTION 


Wide-area spectroscopic surveys are fundamental 
tools for cosmological studies since they enable us to 
probe the Universe both geometrically and dynamically. 


Corresponding author: Yuting Wang 
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In particular, the observed baryon acoustic oscillations 
(BAO) and redshift-space distortions (RSD), which are 
specific three-dimensional clustering patterns of galax- 
ies, can be used to reconstruct the cosmic expansion 
history and the growth rate of the cosmic structure. 
Over the last few decades, massive spectroscopic sur- 
veys, including the Sloan Digital Sky Survey (SDSS) 
(York et al. 2000), the Two-Degree-Field Galaxy Red- 
shift Survey (2dFGRS) (Colless et al. 2001), WiggleZ 
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(Drinkwater et al. 2010), the SDSS-III Baryon Oscilla- 
tion Spectroscopic Survey (BOSS) (Dawson et al. 2013), 
and the SDSS-IV extended Baryon Oscillation Spectro- 
scopic Survey (eBOSS) (Dawson et al. 2016) have proven 
to be a powerful probe for cosmology (Peacock et al. 
2001; Eisenstein et al. 2005; Cole et al. 2005; Percival 
et al. 2007; Blake et al. 2011; Alam et al. 2017, 2021). 

In Fourier space, the BAO feature manifests itself as 
a set of wiggles in the power spectrum, which can be 
used as a standard ruler to measure the cosmic expan- 
sion history. Unfortunately, the BAO feature is gen- 
erally blurred by the nonlinear evolution of the cosmic 
structure, reducing its strength as a cosmic probe. To 
sharpen the BAO feature, the reconstruction scheme was 
proposed (Eisenstein et al. 2007), which effectively re- 
stores the linearity of the density field to a certain extent 
by partially undoing the nonlinear structure evolution. 
This process brings information in high-order statistics 
back to two-point statistics, such that it is not only use- 
ful for boosting the BAO signal, but also helpful for a 
general full-shape analysis of the power spectrum (Hik- 
age et al. 2020). 

Recently, a novel method was proposed (Wang et al. 
2022) to extract information carried by high-order 
statistics from a joint analysis of the power spectrum 
of the pre-reconstructed density field (Pore), the post- 
reconstructed field (Pyost), and the cross-power spec- 
trum between pre- and post-reconstructed fields (Peross)- 
Their analysis, based on the Fisher matrix method, 
showed that a joint analysis using Pore, Ppost and Peross 
can tighten the constraints on the cosmological param- 
eters compared to that using Ppost alone, as part of the 
information from 3 -pt and 4 -pt of the density field can 
be efficiently extracted (Wang et al. 2022). 

In order to exploit the information content from the 
galaxy clustering, an accurate model for the statistics 
of the density field before and after the reconstruction 
is required. Traditional methods for the model build- 
ing rely on the perturbation theory (PT). For Pyre, 
PT-based models can work up to scales of k = 0.2 or 
0.25 h Mpc-!, depending on the effective redshift of 
the galaxy sample (Taruya et al. 2010; Carrasco et al. 
2012; Beutler et al. 2014; d’Amico et al. 2020; Ivanov 
et al. 2020; Chen et al. 2022). However, it is much more 
challenging to build PT-based models that can work on 
the same scales for Pyost or Peross, due to complexities 
brought in by the reconstruction process (Hikage et al. 
2020). One alternative to building PT-based models is 
to develop simulation-based models, i.e., the emulators, 
which have been extensively studied and developed for 
statistics for the pre-reconstructed density fields (Zhai 
et al. 2019; Wibking et al. 2019; Kobayashi et al. 2020; 


Yuan et al. 2022; Winther et al. 2019; Donald-McCann 
et al. 2022; Cuesta-Lazaro et al. 2023; Kwan et al. 2023; 
Cuesta-Lazaro et al. 2023). 

In this work, we develop an emulator for Pj, Poost 
and Peross up to k = 0.5 h Mpc-!, which is trained 
using the DARK QUEST simulations (Nishimichi et al. 
2019) and an halo occupation distribution (HOD) model 
(Zheng et al. 2007). Our emulator is then validated us- 
ing simulations that are not used for the training. Using 
our emulator, we perform a likelihood analysis using the 
monopole and quadrupole of galaxy power spectra up to 
the scale of k = 0.25 h Mpc~', and find a significant in- 
formation gain by a joint (Pre, Poost, Poross} analysis, 
compared to using Ppre alone. 

'This paper is structured as follows: the next section is 
a description of the simulations and galaxy mocks used 
for the training and validation, and Sec. 3 presents the 
details for creating the emulator. In Sec. 4 we perform a 
likelihood analysis using various types of power spectra 
and show the main result of this work, before conclude 
in Sec. 5. 


2. THE DARK QUEST SIMULATIONS AND 
GALAXY MOCKS 


The DARK QUEST simulations that we use to develop 
our emulator are a suite of N-body simulations with 
2048? dark matter particles in 2 h~'Gpc side-length box 
(Nishimichi et al. 2019). The emulator is built using 
a single Dark Quest snapshot at z — 0.549. The cos- 
mologies used in the DARK QUEST simulations cover 
the 100 spatially flat wCDM models with six variable 
parameters and one spatially flat ACDM model with 
the best-fit value of Planck 2015 (Planck Collaboration 
et al. 2016) presented in Table 1, where wy = Qh? and 
We = Oh? are the physical density parameters of baryon 
and cold dark matter, respectively. Qae is the dimen- 
sionless dark energy density parameter. A, and n, are 
the amplitude and slope of the primordial power spec- 
trum, respectively. w is the equation of state parameter 
of dark energy. In addition, the total neutrino mass is 
fixed to $5 m, = 0.06 eV. The effect of massive neutrino 
was included in simulations at the level of linear trans- 
fer function. Cosmological parameters are sampled over 
the parameter range presented in Table 1 using optimal 
maximum distance sliced Latin hypercube designs (Ba 
et al. 2015) so that parameter samplings can cover the 
parameter space as uniformly as possible (Nishimichi 
et al. 2019). We have 15 realizations for the fiducial 
ACDM cosmology. 

The halos were identified using the phase-space 
temporal friends-of-friends halo finder, ROCKSTAR 
(Behroozi et al. 2013). The center of each halo is 
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Parameter  Fiducial value Sampling range 


Wp 0.02225 [0.0211375, 0.0233625] 
We 0.1198 [0.10782, 0.13178] 
Qie 0.6844 [0.54752, 0.82128] 
1n(1019 As) 3.094 [2.4752, 3.7128] 
ns 0.9645 [0.916275, 1.012725] 
w -1 [—1.2, —0.8] 
ClogM 0.596 [0.05, 1.2] 
Mo/Mi 0.1194 [0.0, 0.4] 
a 1.0127 [0.2, 1.5] 
Mı /Mmin 8.1283 [5, 15] 


Table 1. Cosmological and HOD parameters used in our 
emulator. The fiducial cosmological values are from Planck 
2015 (Planck Collaboration et al. 2016). We take the fidu- 
cial HOD parameters based on the fitting to CMASS galaxy 
sample (Manera et al. 2013). The sampling ranges represent 
the bounds on the emulator training set. 


given as the center-of-mass location of a subset of 
member particles in the inner part of that halo, i.e., 
"core particles”, and the velocity of each halo is de- 
fined as the center-of-mass velocity of the core particles. 
Moon = (47 /3)200Amo Room is used as the halo mass 
definition in DARK QUEST, where 5994, is the spherical 
halo boundary radius within which the mean mass den- 
sity is 200 times the mean mass density today Pmo. The 
direct outputs of the ROCKSTAR contain both distinct 
“host” halos and substructures. For the subsequent 
analyses, we remove substructures, which are found 
within the R200m of a more massive nearby halo. 

Galaxy mock catalogs are constructed from the DARK 
QUEST halo catalogs using the halo occupation dis- 
tribution (HOD) framework, which is implemented in 
Halotools (Hearin et al. 2017). We use the functional 
form of HOD as proposed in Zheng et al. (2007) to model 
the mean number (N(M)) of galaxies in halos of mass 
M. 'The mean occupation functions of central and satel- 
lite galaxies are parameterized as 


log M — log Minin 
).o 
Slog M 


(N.(M)) = . h + ert ( 


(NM) = exar (25528) > 2) 
1 

and (N,(M)) = 0 when M < Mo. Mmin is the cut- 
off halo mass scale for hosting central galaxies, with 
(Ne(Mmin)) = 0.5. Giogm describes the profile for 
the halo mass cutoff, making (N.e(M)) smoothly tran- 
sit from 0 to 1. Mo is the minimum halo mass to host 
satellite galaxies. M is the normalization mass scale. 
o is the power-law slope of the satellite HOD at the 
massive end. The occupations of central and satellite 
galaxies are drawn from Bernoulli and Poisson distribu- 
tions, respectively. Central galaxies are placed at the 


halo centers with the same velocities as their host halos, 
where we have ignored the effect of galaxy velocity bias 
(Guo et al. 2015; Guo et al. 2016). We also assume that 
the satellite galaxy distribution within the halos follows 
the Navarro-Frenk-White profile (Navarro et al. 1996). 

We adopt the fiducial values of HOD parameters based 
on the best-fit values (log Mmin = 13.09, Ciog m = 0.596, 
log My = 13.077, log M; = 14.0 and a = 1.0127) ob- 
tained by fitting to CMASS (“constant mass”) galaxy 
sample (Manera et al. 2013). The number density n 
can be derived by performing an integral over the mass 
function, 


dn 
n= f au uy an). (3) 


where dn/dM(M) is the halo mass function. We take 
its fitting formula from Tinker et al. (2008). The re- 
sulting HOD catalog has the number density of n — 
5.6 x 1074 h3 Mpc^?. In our work, we choose to fix the 
number density!, then we sample four out of the five 
HOD parameters of our model. Here we re-parameterize 
HOD parameters as (ag M, Mo/M1, œ, Mı /Mmin} as 
used in Wibking et al. (2020). Their fiducial val- 
ues and flat prior ranges are presented in Table 1. 
We utilize the (randomized) quasi-Monte Carlo method 
to sample re-parameterized HOD parameters in the 
prior range. Specifically, we generate 2450 points in 
4D, using the Sobol sequence (Sobol’ 1967) utility in 
the scipy.stats.qmc package (Virtanen et al. 2020) . 
We scramble the Sobol sequence with a random seed 
searched among integers from 0 to 65535 to minimize 
the mixture discrepancy (Zhou et al. 2013) as the unifor- 
mity measure. The first 2400 HOD samples are assigned 
to 80 cosmologies for training, i.e. each training cosmol- 
ogy is assigned 30 HODs. The remaining 50 HODs are 
assigned to each testing cosmology, yielding a testing set 
of 1000 models. For each sampling, we use Eq. 3 to find 
the value of log Mmin that yields the fixed n. 


3. EMULATING PRE- AND 
POST-RECONSTRUCTED GALAXY POWER 
SPECTRA 


In this section, we use the galaxy samples described 
in previous section to emulate Ppre, Poost and Poross of 
galaxies. We first present the measurement of power 


1 Another way is allowing the number density n to vary, then 
including the information in n by adding a Gaussian prior for 
n into the likelihood (Lange et al. 2022; Donald-McCann et al. 
2022). Varying n would weaken the constraints on HOD parame- 
ters to some extent, depending on the uncertainty on n, but has 
a negligible impact on cosmological parameters (Donald-McCann 
et al. 2022). 
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spectra with and without the density field reconstruc- 
tion, then detail the training process of our emulator, 
and finally discuss the performance of the emulator. 


3.1. The density field reconstruction and the power 
spectrum measurement 


Before performing the density-field reconstruction, we 
implement the Alcock-Paczynski (AP) effect (Alcock & 
Paczynski 1979), which arises from the discrepancy be- 
tween the fiducial cosmology used for redshift-distance 
conversion and the underlying true cosmology. Al- 
though the equation relating the power spectrum before 
and after applying the AP effect is analytically known 
(Ballinger et al. 1996), including this effect in the recon- 
struction is complicated and requires nontrivial mod- 
elling (Sherwin & White 2019). An easier way to ac- 
count for the AP effect is to manipulate the catalog by 
changing the coordinates of the samples. Specifically, we 
convert the galaxy positions in the true coordinate x’ to 
the “observed” coordinate x, and stretch the size lengths 
of simulation box L using the relations of x = A^!x' 
and L + A-!L with 


QI 0 0 
Da(z) Haa(z) 
A= = = 
0 a, 0 QL Dasal) 9I H(z) (4) 
0 0 ed 


where D4 and H are the comoving angular diameter 
distance and Hubble parameter, and quantities with 
subscript *fid" denote those in the fiducial cosmology. 
The galaxy density field is smoothed by convolving with 
the kernel K(k) = exp [-(k Us)?/2] in Fourier space, 
where k is the modulus of the conjugate wavenumber k 
of the observed coordinate x, and we set the smooth- 
ing scale to be X, = 10 h-! Mpc. The displacement 
field is then estimated using the Zeldovich approxima- 
tion, ie, &(k) = —i =K (k), where (k) denotes 
the nonlinear redshift-space galaxy overdensity in the 
observed coordinate, bin is the input linear bias of the 
galaxy sample, and fin is the input logarithmic growth 
rate. An inverse Fourier transformation on S returns 
the configuration-space shift field s(x), which is used 
to move both galaxies and randoms. Although it is 
natural to use the true (fiducial) values of b and f as 
bin and fin for the reconstruction, this does not have 
to be the choice. Actually, the true values of b and f 
are not known before performing the analysis. As we 
shall demonstrate later, the final parameter estimation 
is largely insensitive to the choice of bin and fin. In what 
follows, we use the fiducial b and f to start with, and 
repeat the analysis with a significantly different set of 
bin and fi, to demonstrate the robustness of the final 
result against the choice of these input parameters. 


We measure the multipoles of Pore, Poost and Peross 
using a fast Fourier transform (FFT)-based estimator 
(Hand et al. 2017) implemented in nbodykit (Hand 
et al. 2018). The number density field of galaxies is 
constructed using the cloud-in-cell (CIC) scheme to as- 
sign galaxies to the grid, and we correct for the alias- 
ing effect using the interlacing scheme in Sefusatti et al. 
(2016). For the monopole of the auto power spectrum 
before and after density field reconstruction, the shot- 
noise is removed as a constant. Note that the shot-noise 
of Pues, is scale-dependent (Wang et al. 2022), which 
is estimated using the “half-sum (HS) half-difference 
(HD)” approach and then subtracted off, as in Ando 
et al. (2018); Wang et al. (2022). The k-bin width is set 
to be Ak = 0.01 h Mpc ! for all P(k) measurements. 


3.2. Emulating the power spectra 


In order to avoid the emulated quantities spanning 
several orders of magnitude, we choose to normalize the 
power spectrum multipoles using the linear Kaiser power 
spectrum (Kaiser 1987) with the BAO feature removed, 
i.e. 


RS = n (5) 
9 (b2 2/3bf + 1/5f2) Pawlin ’ 
RE = p (6) 
27 (4/3bf + 4/7f2) Paw tin ` 
X 
RŠ P; (7) 


g (8/15 f?) Pawlin f 


where Pawlin is the linear power spectrum without the 
BAO feature (Eisenstein & Hu 1998). The superscript 
^X" runs for “{pre, post, cross}”. To well capture the 
BAO wiggles in the monopole, we decompose RX into 
two parts, i.e. the smoothed broadband shape (S) part 
and the BAO wiggles (W) part. The S part is obtained 
by applying a Savitzky-Golay filter (Savitzky & Golay 
1964) to RX, i.e. fitting to a certain number of data 
points (NV) with a polynomial of p-th order, and we 
find that N = 41 and p = 4 is a reasonable choice for 
the filtering. Then the BAO wiggles are extracted, i.e. 
Wx = RX — SX. Fig. 5 in the Appendix shows the ob- 
servables (2400 in total) used for training the emulator. 

We follow Zhai et al. (2019, 2023) to construct the em- 
ulator, based on the George (Ambikasaran et al. 2016 
code. In the GP modelling, the correlation between dif- 
ferent training data points is modelled by a covariance 
matrix generated by a kernel function. This is of critical 
importance in the GP modelling since it defines the func- 
tion we wish to learn. Due to the lack of prior knowledge 
of the correlation between training data points, the def- 
inition of the kernel function can be arbitrary. It turns 
out that for the modelling of galaxy power spectrum in 
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Figure 1. Upper panels: The prediction from our emulator for multipole moments of Pore, Poost and Peross for the fiducial 
cosmology that is not used for the training. The symbols are the average of 15 realizations in the fiducial cosmology. The errors 
are the statistical errors for a volume of 3 h^? Gpc?. Lower panels: The fractional difference between the emulator prediction 
and the measured power spectra from mocks in the fiducial cosmology. 


this work, a Matern class kernel (K) is a proper choice 
for accurate predictions. In this model, the hyperpa- 
rameters in the kernel define the strength of correlation 
between neighboring points. The following process of 
training is to optimize the hyperparameters in the ker- 
nel function: 


1 1 1 
InfL= = P MP E In|M|— aN m27, (8) 


where M = K + e?I, K is the covariance matrix popu- 
lated by the kernel function, and o represents the error 
of the training data P. Since each cosmology in the 
training data has only one realization, we estimate the 
uncertainty of the training data using the fiducial cos- 
mology with 15 realizations. With the optimized hyper- 
parameters fed into the kernel function, we can obtain 
the power spectra for an arbitrary point in the parame- 
ter space. 


3.3. Emulator validation 


In Fig. 1, we show the prediction from our emula- 
tor for multipole moments of Pore, Ppost and Peross for 
the fiducial cosmology that is not used for the train- 
ing. The symbols in the upper panels are the average of 
power spectra measured from 15 realizations in the fidu- 
cial cosmology. The error bars are the statistical errors 
computed using Eq. 11. 

The lower panels of Fig. 1 show the fractional dif- 
ference between the emulated and the measured power 
spectra from the galaxy mocks. It indicates that the 
monopole and quadrupole measured from the galaxy 


mocks can be well described by our emulator, by better 
than 1-2% for the monopole and 2-5% for the quadrupole 
at most scales. For the hexadecapole, the fractional dif- 
ference is noisy because the amplitudes of hexadecapole 
are close to zero. Within the statistical errors, our emu- 
lator gives an excellent prediction for the hexadecapole 
as well. 

We quantify the accuracy of our emulator by compar- 
ing the emulator predictions with the power spectrum 
multipoles measured from the mocks in the testing set, 
which includes 20 cosmologies not used in the training 
set, and each cosmology is assigned 50 HODs. Thus 
the testing set has 1000 measurements for each type of 
power spectrum. 

Fig. 2 shows the performance of our emulator for Pore 
(left), Psost (middle) and Pass. (right). The symbols 
in the upper panels of Fig. 2 show the average of frac- 
tional errors of the monopole power spectrum obtained 
by comparing the emulator predictions with the mea- 
surements from 1000 testing mocks. The fractional error 
is within ~ 1-2% over most scales. The solid lines show 
the inverse signal-to-noise ratio computed using the av- 
erage of 15 fiducial monopole measurements. Since the 
quadrupole and hexadecapole moments can cross zero, 
we instead show the difference between the emulator pre- 
diction and the measurement from the testing mocks, 
relative to the statistical error in the middle and lower 
panels of Fig. 2. We find that the emulator error is 
sub-dominant, roughly 50 — 70% of the statistical error 
for a volume of 3 h-? Gpc?. 
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Figure 2. Upper panels: The average of the fractional difference between the emulated and the measured monopole power 
spectrum from 1000 testing mocks. The black solid lines show the inverse signal-to-noise ratio of the mean fiducial monopole 
measurement. The statistical error for monopole power spectrum op, is computed using Eq. 11. Middle panels: The average 
of the difference between the emulated and the measured quadrupole power spectrum relative to the statistical error. Lower 
panels: The average of the difference between the emulated and the measured hexadecapole power spectrum relative to the 


statistical error. 


kmax = 0.25 kmax = 0.5 
bin, fin (baa; fna) (0.9 baa, 0.7 faa) (bna; faa) 
Pore Pbost Poross Ppre4 post Pau Pan Pbost 
P-factor 1.09 1.09 1.09 1.22 1.36 1.36 1.21 
Qm 0.318 + 0.0110 0.315 + 0.0098 0.317 + 0.0100 0.318 + 0.0082 0.320 + 0.0061 0.320 + 0.0080 0.317 + 0.0078 
Ho 67.13 + 0.84 67.00 + 0.54 67.05 + 0.69 67.15 + 0.52 67.06 + 0.49 67.12 + 0.49 67.02 + 0.57 
og 0.849 + 0.038 0.833 + 0.029 0.842 + 0.040 0.834 + 0.018 0.834 + 0.017 0.834 + 0.018 0.831 + 0.016 
aL 1.0008 + 0.0115 | 1.0039 + 0.0088 1.0022 + 0.0097 1.0004 + 0.0083 1.0013 + 0.0076 | 1.0003 + 0.0081 | 1.0028 + 0.0090 
alj 1.0000 + 0.0118 | 1.0041 + 0.0106 1.0016 + 0.0104 0.9996 + 0.0097 0.9999 + 0.0084 | 0.9987 + 0.0098 | 1.0024 + 0.0103 
fos 0.497 + 0.023 0.487 + 0.018 0.492 + 0.024 0.488 + 0.0116 0.488 + 0.0104 0.488 + 0.0114 0.486 + 0.010 


Table 2. The constraints on derived cosmological parameters (Qm, Ho, o8) and BAO and RSD parameters (a, a1), fog) using 
different data sets. The fiducial values of the parameters derived are Qm = 0.3156, Ho = 67.24, og = 0.831, a, = 1, aj) = 1 and 
fos(z = 0.549) = 0.485, which are well recovered in all cases. The factor P here is calculated using Eq.13. Our default choices 
of (b, f) parameters for reconstruction are bfa = 1.824 and faa = 0.778 determined in the fiducial cosmology. To explore the 


effect of these inputs, we vary the bias by —10% (i.e. 0.9 bsa) and the f by —30% (i.e. 0.7 faa). 


catalogs in the fiducial cosmology, which are not in the 
training set. We use Cobaya (Torrado & Lewis 2021) 
to perform a Markov chain Monte Carlo (MCMC) sam- 
pling of the 9-dimensional parameter space within the 


4. COSMOLOGICAL APPLICATION TO MOCK 
CATALOGS 


In this section, we test our emulator by applying it 
to the power spectrum measurements from mock galaxy 
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Figure 3. Left: The 1D posterior distribution and 2D contour plots showing 68% and 95% credible regions for Qm, Ho and 
og using the pre-reconstructed power spectrum alone (grey), post-reconstructed power spectrum alone (red), and joint result of 
pre, post and cross power spectra (blue). Right: The same plot derived from the post-reconstructed power spectrum alone with 
two choices of kmax = 0.25 h Mpc^! (red) and kmax = 0.5 h Mpc^! (blue). 
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Figure 4. Left: The 1D posterior distribution and 2D contour plots showing 68% and 95% credible regions for the derived 
aL, a; and fog parameters using the pre-reconstructed power spectrum alone (grey), post-reconstructed power spectrum alone 
(red), and joint result of pre-, post- and cross-power spectra (blue). Right: The result using the post-reconstructed power 
spectrum alone with two choices of kmax = 0.25 h Mpc^! (red) and kmax = 0.5 h Mpc^! (blue). 


flat ACDM framework, i.e. the w parameter is fixed to 
—1. The following x? gets minimised in the fitting, 


and we add a Gaussian prior for wy centered on 0.0223 
with the width 0.00036 from BBN constraints (Mossa 
et al. 2020) and a Gaussian prior for n, parameters cen- 
tered on 0.965 with the width 0.0042 from Planck con- 
straints (Aghanim et al. 2020). Here Pamy is the predic- 
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tion of the emulator, and Pyea denotes the average of 
power spectra measured from 15 realizations in the fidu- 
cial cosmology. C is the covariance matrix consisting of 
two terms, 


C = Caata H 02a] (10) 


emu ? 


where Cgata is the contribution of the sample statis- 
tics, and Gemu corresponds to the uncertainty due to 
the emulating error in the model prediction. Since the 
emulator is constructed for individual scale bins, here 
we assume that emulating error is independent among 
different scale bins, which is computed using the test- 
ing set as discussed in Sec.3.3. As the DARK QUEST 
simulations only have 15 realizations in the fiducial cos- 
mology which are not enough to construct a robust co- 
variance matrix for galaxy clustering analysis, we com- 
pute the correlation matrix using GLAM simulations 
(Klypin & Prada 2018) which have 986 independent real- 
izations. We adopt the best-fit HOD parameters for the 
M; < —21.6 CMASS samples in Guo et al. (2014), lead- 
ing to a contribution of shot noise (~ 21074 A? Mpc^?) 
to the covariance. The side length of the GLAM sim- 
ulation box is 1 h^! Gpe. To be close to the volume 
of BOSS survey (Alam et al. 2017), the data covariance 
matrix Cgata is rescaled by a factor of 3. Specifically, we 
derive the Cgata from GLAM mocks, i.e. 


N, 
(Ge Juss = z 1 5 [Pr (ki) — Pe(ki)| x 
[Pi (k;) — Po (k;)] , (11) 


where the mean of power spectra is defined as 
js 
Pe(ki) = N. 5 P? (ki), (12) 
5 n=1 


and N, = 986 is the number of mocks. Since the co- 
variance matrix Cgata is estimated from finite number 
of mocks, C444 is generally biased. To correct, we mul- 
tiply Caata by a factor of P (Percival et al. 2022), 

(Ns — 1)[1 + BCNa — Ne] 


= 1 
B Ns — Nac-Ne—1 : (13) 


with, 


N,— Na —2 
B= . 14 
(Ns — Na — 1)(N, — Na — 4) uS 


Here, N, is the number of simulations used to estimate 
the covariance, Ng is the number of the data vector, and 
Ne is the number of parameters that are being fitted. 
Note that the P-factor generally dilutes the constraints 


on parameters being fitting by rescaling the covariance, 
namely, when using Pore, Ppost, or Peross alone, the P- 
factor increases the covariance by 996. For joint analyses 
of Pore + Prost and Pore + Prost + Poross (i.e. Pau), the 
P-factor enlarges the covariance by 22% and 36%, re- 
spectively. 

Using k modes at k < 0.25 h Mpc! for both the 
monopole and quadrupole of the power spectra’, we ob- 
tain the 1D posterior distributions and 2D contour plots 
for the derived cosmological parameters Qm, Ho and og, 
as shown in Fig. 3. The mean values with 68% credible 
intervals of Qm, Ho and og are presented in Table 2. The 
left contour plot in Fig. 3 shows a comparison of the 
fitting results using the pre-reconstructed power spec- 
trum alone (grey), post-reconstructed power spectrum 
alone (red), and the joint fitting of pre-, post-, and cross- 
power spectra (blue) for kmax = 0.25 h Mpc™!. Fig. 3 
shows that our emulator-based analysis can recover the 
expected values of cosmological parameters within sta- 
tistical errors. The post-reconstructed power spectrum 
alone is more informative, tightening the constraints on 
Om, Ho and og by 10.9%, 35.7% and 23.7%, respectively 
compared to that from Pre alone. It is found that the 
joint fit of the pre-, post-, and cross-power spectra, de- 
notes as P,, gives the tightest constraint, namely, the 
constraints on Nm, Ho and og from P4j are improved 
by 44.5%, 41.7%, and 55.3%, respectively, compared to 
that from Ppre alone. 

The relative information gain from Pa compared to 
that from Ppre is expected to be greater, as we include 
more modes on smaller scales. However, given the num- 
ber of mocks and data points, we do not go further than 
kmax = 0.25 h Mpc! for a Pan analysis. Instead, we 
perform a P,os¢-alone analysis with kmax = 0.5 h Mpc ^! 
for a demonstration. We compare the constraints on 
Om, Ho and og using Ppost alone for kmax = 0.25 and 
0.5 h Mpc! in the right panel of Fig. 3. As shown, 
adding modes on smaller scales helps to constrain og, 
namely, its uncertainty gets reduced by 44.8% as kmax 
increases from 0.25 to 0.5 h Mpc-!. Also, adding more 
modes does not generate bias in the posteriors, demon- 
strating the robustness of our emulator. 

We then derive the BAO and RSD parameters (a_, 
oj, fog), and show the 1D posterior distributions and 
2D contour plots in Fig. 4, with the mean values and 
6896 credible intervals of the BAO and RSD parame- 
ters listed in Table 2. Compared to Ppre alone, the 
constraints on (o1, oj, fog) parameters from Pan are 
improved by 33.996, 28.896, and 54.896, respectively. 


1 


2 Unless otherwise mentioned, kmax is 0.25 h Mpc” as a default 


setting for the analysis in this work. 


EMULATOR FOR GALAXY POWER SPECTRUM 9 


P ost alone gives a tighter constraint than that using 
Pore only, but is outnumbered by Pay by 13.6% for aL, 
20.8% for aj and 42.2% for fog. The right panel of 
Fig. 4 shows the contours of the derived BAO and 
RSD parameters with two different choices of kmax, 
as in Fig. 3. As expected, adding small-scale modes 
(k € [0.25, 0.5] h Mpc~?) helps to tighten the constraint 
on fog significantly, namely, the uncertainty gets re- 
duced by 44.4%. Note that this level of constraint can 
be achieved by using Pay with kmax = 0.25 h Mpc7!. 

We confirm that the information content in Peross is 
complementary to that in Pore and Ppost, as claimed in 
Wang et al. (2022). Specifically, adding Pao to our 
joint analysis using Ppre and Pyos improves the con- 
straints on (Qm, Ho, ag) and (a1, oj, fos) by 5.5%- 
25.696, as presented in Table 2. 

Since the BAO reconstruction process requires a pair 
of input b and f, denoted as bin and fin, the recon- 
structed power spectrum depends on bin and fin. One 
natural question is whether and how much the final pos- 
terior depends on bin and fin. To investigate, we use a 
set of bin and fi, that are significantly different from the 
fiducial b and f, namely, bin = 0.9 bfa and fin = 0.7 faa. 
Note that this level of deviation from the true values is 
much greater than that constrained by a typical galaxy 
survey such as BOSS (Beutler et al. 2017), thus is suf- 
ficient to study the impact of using ‘wrong’ cosmologi- 
cal parameters for the reconstruction on the final result 
(Sherwin & White 2019). We repeat our analysis using 
this set of bin and fin, and show the parameter constraint 
from P,y in this case in Table 2 and in Fig. 6 in the Ap- 
pendix. As shown, the constraint is largely unchanged, 
demonstrating the robustness of our method against the 
choice of bin and fin. 

For completeness, we show the full contour plot for all 
parameters, including the cosmological and HOD pa- 
rameters, in Fig. 7 in the Appendix using different 
combinations of the power spectra. As expected, Pay 
provides the tightest constraint for all parameters, as 
predicted by the Fisher matrix analysis (Wang et al. 
2022). 

Results presented so far do not include information 
from £4, the hexadecapole, so it is useful to explore how 
P4 can help to reduce the uncertainties. We perform an 
additional analysis using Pan including P4 for all types 
of power spectra with kmax = 0.25 h Mpc-!, and find 
that P4 can barely further improve the constraints on 
cosmological parameters, as shown in Fig. 8 in the Ap- 
pendix. 


5. CONCLUSION AND DISCUSSIONS 


In this work, we develop an emulator for galaxy power 
spectra for catalogs with and without the BAO recon- 
struction based on the DARK QUEST simulations with 
HOD models to populate galaxies. The theoretical pre- 
dictions of power spectra derived from our emulator are 
in excellent agreement with the ground truth (with a 
deviation less than 5%). Our emulator-based likelihood 
analysis on mock galaxy catalogs demonstrates that in- 
put cosmological parameters can be accurately recovered 
from power spectra up to scales of k = 0.5 h Mpc-!. 

Our analysis shows that Pore, Ppost and Peross are 
highly complementary, thus jointly using these power 
spectra can significantly improve constraints on cosmo- 
logical parameters, which is consistent with the claim 
based on a Fisher matrix analysis (Wang et al. 2022). 
Specifically, the uncertainty of (Qm,Hpo,08) derived 
from Py + Poost + Peross gets tightened by 44.5%, 
41.7% and 55.3%, respectively, compared to that de- 
rived from Pore (kmax = 0.25 h Mpc ^! in all cases). 
The derived BAO and RSD parameters, a1,a); and 
fog, are better determined by 33.9%, 28.8% and 54.8%, 
respectively. Adding small-scale modes to the analysis 
helps to constrain parameters related to the amplitude 
of power spectra. For example, extending kmax = 0.25 
to 0.5 h Mpc? for Prost reduces the uncertainty on og 
and fog by 44.8% and 44.4%, respectively. We also find 
that the posteriors of parameters are largely insensitive 
to input values of b and f, which are required for the 
BAO-reconstruction process. 

The methodology and pipeline developed in this work 
make it possible to extract high-order information from 
two-point statistics, which is of significance for cosmo- 
logical studies. Our method and emulator can be di- 
rectly applied to existing and forthcoming galaxy sur- 
veys including BOSS (Dawson et al. 2013), eBOSS 
(Dawson et al. 2016), DESI (Dark Energy Spectroscopic 
Instrument, Aghamousa et al. 2016a,b), PFS (Prime Fo- 
cus Spectrograph, Takada et al. 2014) and so forth, af- 
ter the required tuning in the emulation process for the 
number density, effective redshifts of the galaxy samples 
etc., which is technically straightforward. 
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APPENDIX 


This appendix includes four figures, with information detailed in the figure captions. 
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Ct 


Figure 5. The complete training set for our emulator, which consists of 2400 power spectrum multipoles for Ppre (left column), 
Ppost (middle) and Peross (right column). All spectra have been properly normalized by power spectra derived from the linear 
Kaiser formula, so that their amplitudes are within a narrow range. The normalized monopole, Rj, is divided into a smoothed 
shape part (55) and a BAO “wiggles” part (W). More details are presented in the main text and Eq. (5). 
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Figure 6. The 1D posterior distribution and 2D contour plots showing 68% and 95% credible regions for the derived parameters 
(Om, Ho, ca, a1, oq, fog) from Pan reconstructed using two different sets of bi; and fin shown in the legend. The dashed lines 
show the expected values of the parameters. 
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Figure 7. The 1D posterior distribution and 2D contour plots showing 68% and 95% credible regions for cosmological and 
HOD parameters derived from combinations of different types of power spectra shown in the legend. The dashed lines show the 
expected values of the parameters. 
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Figure 8. The 1D posterior distribution and 2D contour plots showing 68% and 95% credible regions for (Qm, 


fos) using Pay with and without the hexadecapole. The dashed lines show the expected values of the parameters. 


